Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation

نویسندگان

Andreas Zollmann

Ashish Venugopal

Stephan Vogel

چکیده

Statistical machine translation (SMT) is based on the ability to effectively learn word and phrase relationships from parallel corpora, a process which is considerably more difficult when the extent of morphological expression differs significantly across the source and target languages. We present techniques that select appropriate word segmentations in the morphologically rich source language based on contextual relationships in the target language. Our results take advantage of existing word level morphological analysis components to improve translation quality above state-of-the-art on a limited-data Arabic to English speech translation task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation

The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivate...

متن کامل

Applying Morphology Generation Models to Machine Translation

We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. Our inflection generation models are trained independently of the SMT system. We investigate different ways of combining the inflection prediction component with the SMT syst...

متن کامل

Translate, Predict or Generate: Modeling Rich Morphology in Statistical Machine Translation

We compare three methods of modeling morphological features in statistical machine translation (SMT) from English to Arabic, a morphologically rich language. Features can be modeled as part of the core translation process mapping source tokens to target tokens. Alternatively these features can be generated using target monolingual context as part of a separate generation (or post-translation in...

متن کامل

Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weight...

متن کامل

Using Linguistic Knowledge in Statistical Machine Translation

In this thesis, we present methods for using linguistically motivated information to enhance the performance of statistical machine translation (SMT). One of the advantages of the statistical approach to machine translation is that it is largely languageagnostic. Machine learning models are used to automatically learn translation patterns from data. SMT can, however, be improved by using lingui...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation

نویسندگان

چکیده

منابع مشابه

Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation

Applying Morphology Generation Models to Machine Translation

Translate, Predict or Generate: Modeling Rich Morphology in Statistical Machine Translation

Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

Using Linguistic Knowledge in Statistical Machine Translation

عنوان ژورنال:

اشتراک گذاری